118 research outputs found

    A cost-effective clustered architecture

    Get PDF
    In current superscalar processors, all floating-point resources are idle during the execution of integer programs. As previous works show, this problem can be alleviated if the floating-point cluster is extended to execute simple integer instructions. With minor hardware modifications to a conventional superscalar processor, the issue width can potentially be doubled without increasing the hardware complexity. In fact, the result is a clustered architecture with two heterogeneous clusters. We propose to extend this architecture with a dynamic steering logic that sends the instructions to either cluster. The performance of clustered architectures depends on the inter-cluster communication overhead and the workload balance. We present a scheme that uses run-time information to optimise the trade-off between these figures. The evaluation shows that this scheme can achieve an average speed-up of 35% over a conventional 8-way issue (4 int+4 fp) machine and that it outperforms the previously proposed one.Peer ReviewedPostprint (published version

    Replacing 6T SRAMs with 3T1D DRAMs in the L1 data cache to combat process variability

    Get PDF
    With continued technology scaling, process variations will be especially detrimental to six-transistor static memory structures (6T SRAMs). A memory architecture using three-transistor, one-diode DRAM (3T1D) cells in the L1 data cache tolerates wide process variations with little performance degradation, making it a promising choice for on-chip cache structures for next-generation microprocessors.Peer ReviewedPostprint (published version

    Statistical analysis and comparison of 2T and 3T1D e-DRAM minimum energy operation

    Get PDF
    Bio-medical wearable devices restricted to their small-capacity embedded-battery require energy-efficiency of the highest order. However, minimum-energy point (MEP) at sub-threshold voltages is unattainable with SRAM memory, which fails to hold below 0.3V because of its vanishing noise margins. This paper examines the minimum-energy operation point of 2T and 3T1D e-DRAM gain cells at the 32-nm technology node with different design points: up-sizing transistors, using high- V th transistors, read/write wordline assists; as well as operating conditions (i.e., temperature). First, the e-DRAM cells are evaluated without considering any process variations. Then, a full-factorial statistical analysis of e-DRAM cells is performed in the presence of threshold voltage variations and the effect of upsizing on mean MEP is reported. Finally, it is shown that the product of the read and write lengths provides a knob to tradeoff energy-efficiency for reliable MEP energy operation.Peer ReviewedPostprint (author's final draft

    A survey of machine and deep learning methods for privacy protection in the Internet of things

    Get PDF
    Recent advances in hardware and information technology have accelerated the proliferation of smart and interconnected devices facilitating the rapid development of the Internet of Things (IoT). IoT applications and services are widely adopted in environments such as smart cities, smart industry, autonomous vehicles, and eHealth. As such, IoT devices are ubiquitously connected, transferring sensitive and personal data without requiring human interaction. Consequently, it is crucial to preserve data privacy. This paper presents a comprehensive survey of recent Machine Learning (ML)- and Deep Learning (DL)-based solutions for privacy in IoT. First, we present an in depth analysis of current privacy threats and attacks. Then, for each ML architecture proposed, we present the implementations, details, and the published results. Finally, we identify the most effective solutions for the different threats and attacks.This work is partially supported by the Generalitat de Catalunya under grant 2017 SGR 962 and the HORIZON-GPHOENIX (101070586) and HORIZON-EUVITAMIN-V (101093062) projects.Peer ReviewedPostprint (published version

    A Real-Time Error Detection (RTD) architecture and its use for reliability and post-silicon validation for F/F based memory arrays

    Get PDF
    This work proposes in-situ Real-Time Error Detection (RTD): embedding hardware in a memory array for detecting a fault in the array when it occurs, rather than when it is read. RTD breaks the serialization between data access and error-detection and, thus, it can speed-up the access-time of arrays that use in-line error-correction. The approach can also reduce the time needed to root-cause array related bugs during post-silicon validation and product testing. The paper introduces a two-dimensional error-correction scheme based on RTD and, also, presents a proactive error-correction method that combines RTD with demand-scrubbing. The work describes how to build RTD into a memory array with flip-flops to track in real-time the column-parity. A comparison of the proposed two-dimensional ECC scheme, as compared to single-error-correction-double-error-detection, shows that the RTD design has comparable error-detection-and-correction strength and, depending on the array dimensions and configuration, RTD reduces access time by 4% to 26% at an area and power overhead (negative is a reduction) between -7% to 33% and -42% to 86% respectively.Peer ReviewedPostprint (author's final draft

    Software-controlled operand-gating

    Get PDF
    Operand gating is a technique for improving processor energy efficiency by gating off sections of the data path that are unneeded by short-precision (narrow) operands. A method for implementing software-controlled power gating is proposed and evaluated. The instruction set architecture (ISA) is enhanced to include opcodes that specify operand widths (if not already included in the ISA). A compiler or a binary translator uses statically available information to determine initial value ranges. The technique is enhanced through a profile-based analysis that results in the specialization of certain code regions for a given value range. After the analysis, instruction opcodes are assigned using the minimum required width. To evaluate this technique the Alpha instruction set is enhanced to include opcodes for 8, 16, and 32 bit operands. Applying the proposed software technique to the Speclnt95 benchmarks results in energy-delay savings of 14%. When combined with previously proposed hardware-based techniques, the energy-delay benefit is 28%.Peer ReviewedPostprint (published version

    Strategies to enhance the 3T1D-DRAM cell variability robustness beyond 22 nm

    Get PDF
    3T1D cell has been stated as a valid alternative to be implemented on L1 memory cache to substitute 6T, highly affected by device variability as technology dimensions are reduced. In this work, we have shown that 22 nm 3T1D memory cells present significant tolerance to high levels of device parameter fluctuation. Moreover, we have observed that when variability is considered the write access transistor becomes a significant detrimental element on the 3T1D cell performance. Furthermore, resizing and temperature control have been presented as some valid strategies in order to mitigate the 3T1D cell variability.Peer ReviewedPostprint (author's final draft

    Lightweight protection of cryptographic hardware accelerators against differential fault analysis

    Get PDF
    © 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Hardware acceleration circuits for cryptographic algorithms are largely deployed in a wide range of products. The HW implementations of such algorithms often suffer from a number of vulnerabilities that expose systems to several attacks, e.g., differential fault analysis (DFA). The challenge for designers is to protect cryptographic accelerators in a cost-effective and power-efficient way. In this paper, we propose a lightweight technique for protecting hardware accelerators implementing AES and SHA-2 (which are two widely used NIST standards) against DFA. The proposed technique exploits partial redundancy to first detect the occurrence of a fault and then to react to the attack by obfuscating the output values. An experimental campaign demonstrated that the overhead introduced is 8.32% for AES and 3.88% for SHA-2 in terms of area, 0.81% for AES and 12.31% for SHA-2 in terms of power with no working frequency reduction. Moreover, a comparative analysis showed that our proposal outperforms the most recent related countermeasures.Peer ReviewedPostprint (author's final draft

    Review on suitable eDRAM configurations for next nano-metric electronics era

    Get PDF
    We summarize most of our studies focused on the main reliability issues that can threat the gain-cells eDRAM behavior when it is simulated at the nano-metric device range has been collected in this review. So, to outperform their memory cell counterparts, we explored different technological proposals and operational regimes where it can be located. The best memory cell performance is observed for the 3T1D-eDRAM cell when it is based on FinFET devices. Both device variability and SEU appear as key reliability issues for memory cells at sub-22nm technology node.Peer ReviewedPostprint (author's final draft
    • …
    corecore